Goto

Collaborating Authors

 zeroth-order adaptive momentum method


ZO-AdaMM: Zeroth-Order Adaptive Momentum Method for Black-Box Optimization

Neural Information Processing Systems

The adaptive momentum method (AdaMM), which uses past gradients to update descent directions and learning rates simultaneously, has become one of the most popular first-order optimization methods for solving machine learning problems. However, AdaMM is not suited for solving black-box optimization problems, where explicit gradient forms are difficult or infeasible to obtain. In this paper, we propose a zeroth-order AdaMM (ZO-AdaMM) algorithm, that generalizes AdaMM to the gradient-free regime. We show that the convergence rate of ZO-AdaMM for both convex and nonconvex optimization is roughly a factor of $O(\sqrt{d})$ worse than that of the first-order AdaMM algorithm, where $d$ is problem size. In particular, we provide a deep understanding on why Mahalanobis distance matters in convergence of ZO-AdaMM and other AdaMM-type methods. As a byproduct, our analysis makes the first step toward understanding adaptive learning rate methods for nonconvex constrained optimization.Furthermore, we demonstrate two applications, designing per-image and universal adversarial attacks from black-box neural networks, respectively. We perform extensive experiments on ImageNet and empirically show that ZO-AdaMM converges much faster to a solution of high accuracy compared with $6$ state-of-the-art ZO optimization methods.


Reviews: ZO-AdaMM: Zeroth-Order Adaptive Momentum Method for Black-Box Optimization

Neural Information Processing Systems

This paper proposes a zeroth-order adaptive momentum method for black-box optimization, by approximating the stochastic gradient using the forward difference of two function values at a random unit direction. The paper also shows the convergence analysis in terms of Mahalanobis distance for both unconstrained and constrained nonconvex optimization with the ZO-AdaMM, which results in sublinear convergence rates that are roughly a factor of the square root of dimension worse than that of the first-order ZO-AdaMM, as well as for constrained convex optimization. The proposed scheme is quite interesting, which solves the (non)convex optimization in a new perspective, and somewhat provides new insight to the adaptive momentum methods. In particular, the paper provides a formal conclusion that the Euclidean projection may results in non-convergence issue in stochastic optimization. The paper also shows the applications to black-box adversarial attacks problems and validate the method by comparing it with other ZO methods.


ZO-AdaMM: Zeroth-Order Adaptive Momentum Method for Black-Box Optimization

Neural Information Processing Systems

The adaptive momentum method (AdaMM), which uses past gradients to update descent directions and learning rates simultaneously, has become one of the most popular first-order optimization methods for solving machine learning problems. However, AdaMM is not suited for solving black-box optimization problems, where explicit gradient forms are difficult or infeasible to obtain. In this paper, we propose a zeroth-order AdaMM (ZO-AdaMM) algorithm, that generalizes AdaMM to the gradient-free regime. We show that the convergence rate of ZO-AdaMM for both convex and nonconvex optimization is roughly a factor of O(\sqrt{d}) worse than that of the first-order AdaMM algorithm, where d is problem size. In particular, we provide a deep understanding on why Mahalanobis distance matters in convergence of ZO-AdaMM and other AdaMM-type methods.


ZO-AdaMM: Zeroth-Order Adaptive Momentum Method for Black-Box Optimization

Chen, Xiangyi, Liu, Sijia, Xu, Kaidi, Li, Xingguo, Lin, Xue, Hong, Mingyi, Cox, David

Neural Information Processing Systems

The adaptive momentum method (AdaMM), which uses past gradients to update descent directions and learning rates simultaneously, has become one of the most popular first-order optimization methods for solving machine learning problems. However, AdaMM is not suited for solving black-box optimization problems, where explicit gradient forms are difficult or infeasible to obtain. In this paper, we propose a zeroth-order AdaMM (ZO-AdaMM) algorithm, that generalizes AdaMM to the gradient-free regime. We show that the convergence rate of ZO-AdaMM for both convex and nonconvex optimization is roughly a factor of $O(\sqrt{d})$ worse than that of the first-order AdaMM algorithm, where $d$ is problem size. In particular, we provide a deep understanding on why Mahalanobis distance matters in convergence of ZO-AdaMM and other AdaMM-type methods.